52 research outputs found

    Comparison of splice sites in mammals and chicken.

    Full text link
    We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites

    An assessment of gene prediction accuracy in large DNA sequences

    Full text link
    One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the ∼200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy ofGENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE,PROCRUSTES, andBLASTX, was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology

    Comparative gene prediction in human and mouse.

    Full text link
    The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally

    Distilling a visual network of Retinitis Pigmentosa gene-protein interactions to uncover new disease candidates

    Get PDF
    BACKGROUND: Retinitis pigmentosa (RP) is a highly heterogeneous genetic visual disorder with more than 70 known causative genes, some of them shared with other non-syndromic retinal dystrophies (e.g. Leber congenital amaurosis, LCA). The identification of RP genes has increased steadily during the last decade, and the 30% of the cases that still remain unassigned will soon decrease after the advent of exome/genome sequencing. A considerable amount of genetic and functional data on single RD genes and mutations has been gathered, but a comprehensive view of the RP genes and their interacting partners is still very fragmentary. This is the main gap that needs to be filled in order to understand how mutations relate to progressive blinding disorders and devise effective therapies. METHODOLOGY: We have built an RP-specific network (RPGeNet) by merging data from different sources: high-throughput data from BioGRID and STRING databases, manually curated data for interactions retrieved from iHOP, as well as interactions filtered out by syntactical parsing from up-to-date abstracts and full-text papers related to the RP research field. The paths emerging when known RP genes were used as baits over the whole interactome have been analysed, and the minimal number of connections among the RP genes and their close neighbors were distilled in order to simplify the search space. CONCLUSIONS: In contrast to the analysis of single isolated genes, finding the networks linking disease genes renders powerful etiopathological insights. We here provide an interactive interface, RPGeNet, for the molecular biologist to explore the network centered on the non-syndromic and syndromic RP and LCA causative genes. By integrating tissue-specific expression levels and phenotypic data on top of that network, a more comprehensive biological view will highlight key molecular players of retinal degeneration and unveil new RP disease candidates

    La regeneració i l'homeòstasi en les planàries, un model clàssic de biologia del desenvolupament

    Get PDF
    La regeneració és la capacitat d'un organisme de reemplaçar fragments perduts a causa d'una amputació traumàtica o degeneració. La regeneració de les noves estructures té lloc bé a partir de proliferació cel·lular i formació de novo, bé per remodelació dels teixits preexistents. Les planàries poden regenerar un nou organisme sencer a partir de petits fragments del seu cos. Aquest fet ha atret l'interès dels científics al llarg de la història. El 1814, Dalyell conclou que les planàries «es poden considerar immortals sota la fulla d'una navalla». La regeneració en les planàries requereix la generació de teixit nou en el lloc de la ferida mitjançant proliferació cel·lular, que produeix un teixit nou indiferenciat, el blastema, i el remodelatge dels teixits preexistents per recuperar les proporcions del nou organisme regenerat. Una altra propietat espectacular de les planàries és la capacitat de créixer i decréixer segons la ingesta d'aliment. En tot moment, però, al llarg d'aquest creixement/decreixement es mantenen les proporcions corporals i funcions correctes, gràcies al control homeostàtic. Tota aquesta plasticitat és deguda, a escala cel·lular, a la presència de cèl·lules mare totipotents en un alt percentatge (entre el 20-30 % del total cel·lular en un organisme adult). Una altra propietat fonamental és la contínua activitat dels mecanismes morfogenètics, que normalment apareixen una sola vegada en el desenvolupament de la resta dels altres organismes. L'aplicació de noves metodologies a escala cel·lular, molecular i genètica en l'era postgenòmica ens ha permès estudiar funcionalment vies i gens del desenvolupament en un nou escenari, la regeneració de planàries

    Planarians as a model to assess in vivo the role of matrix metalloproteinase genes during homeostasis and regeneration

    Get PDF
    Matrix metalloproteinases (MMPs) are major executors of extracellular matrix remodeling and, consequently, play key roles in the response of cells to their microenvironment. The experimentally accessible stem cell population and the robust regenerative capabilities of planarians offer an ideal model to study how modulation of the proteolytic system in the extracellular environment affects cell behavior in vivo. Genome-wide identification of Schmidtea mediterranea MMPs reveals that planarians possess four mmp-like genes. Two of them (mmp1 and mmp2) are strongly expressed in a subset of secretory cells and encode putative matrilysins. The other genes (mt-mmpA and mt-mmpB) are widely expressed in postmitotic cells and appear structurally related to membrane-type MMPs. These genes are conserved in the planarian Dugesia japonica. Here we explore the role of the planarian mmp genes by RNA interference (RNAi) during tissue homeostasis and regeneration. Our analyses identify essential functions for two of them. Following inhibition of mmp1 planarians display dramatic disruption of tissues architecture and significant decrease in cell death. These results suggest that mmp1 controls tissue turnover, modulating survival of postmitotic cells. Unexpectedly, the ability to regenerate is unaffected by mmp1(RNAi). Silencing of mt-mmpA alters tissue integrity and delays blastema growth, without affecting proliferation of stem cells. Our data support the possibility that the activity of this protease modulates cell migration and regulates anoikis, with a consequent pivotal role in tissue homeostasis and regeneration. Our data provide evidence of the involvement of specific MMPs in tissue homeostasis and regeneration and demonstrate that the behavior of planarian stem cells is critically dependent on the microenvironment surrounding these cells. Studying MMPs function in the planarian model provides evidence on how individual proteases work in vivo in adult tissues. These results have high potential to generate significant information for development of regenerative and anti cancer therapies

    Complex selection on 5' splice sites in intron-rich organisms

    Full text link
    In contrast to the typically streamlined genomes of prokaryotes, many eukaryotic genomes are riddled with long intergenic regions, spliceosomal introns, and repetitive elements. What explains the persistence of these and other seemingly suboptimal structures? There are three general hypotheses: (1) the structures in question are not actually suboptimal but optimal, being favored by selection, for unknown reasons; (2) the structures are not suboptimal, but of (essentially) equal fitness to 'optimal' ones; or (3) the structures are truly suboptimal, but selection is too weak to systematically eliminate them. The 5' splice sites of introns offer a rare opportunity to directly test these hypotheses. Intron-poor species show a clear consensus splice site; most introns begin with the same six nucleotide sequence (typically GTAAGT or GTATGT), indicating efficient selection for this consensus sequence. In contrast, intron-rich species have much less pronounced boundary consensus sequences, and only small minorities of introns in intron-rich species share the same boundary sequence. We studied rates of evolutionary change of 5' splice sites in three groups of closely related intron-rich species--three primates, five Drosophila species, and four Cryptococcus fungi. Surprisingly, the results indicate that changes from consensus-to-variant nucleotides are generally disfavored by selection, but that changes from variant to consensus are neither favored nor disfavored. This evolutionary pattern is consistent with selective differences across introns, for instance, due to compensatory changes at other sites within the gene, which compensate for the otherwise suboptimal consensus-to-variant changes in splice boundaries

    Digital Gene Expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea

    Get PDF
    The freshwater planarian Schmidtea mediterranea is recognised as a valuable model for research into adult stem cells and regeneration. With the advent of the high-throughput sequencing technologies, it has become feasible to undertake detailed transcriptional analysis of its unique stem cell population, the neoblasts. Nonetheless, a reliable reference for this type of studies is still lacking. Taking advantage of digital gene expression (DGE) sequencing technology we compare all the available transcriptomes for S. mediterranea and improve their annotation. These results are accessible via web for the community of researchers. Using the quantitative nature of DGE, we describe the transcriptional profile of neoblasts and present 42 new neoblast genes, including several cancer-related genes and transcription factors. Furthermore, we describe in detail the Smed-meis-like gene and the three Nuclear Factor Y subunits Smed-nf-YA, Smed-nf-YB-2 and Smed-nf-YC. DGE is a valuable tool for gene discovery, quantification and annotation. The application of DGE in S. mediterranea confirms the planarian stem cells or neoblasts as a complex population of pluripotent and multipotent cells regulated by a mixture of transcription factors and cancer-related genes

    The nervous system of Xenacoelomorpha: a genomic perspective

    Full text link
    Xenacoelomorpha is, most probably, a monophyletic group that includes three clades: Acoela, Nemertodermatida and Xenoturbellida. The group still has contentious phylogenetic affinities; though most authors place it as the sister group of the remaining bilaterians, some would include it as a fourth phylum within the Deuterostomia. Over the past few years, our group, along with others, has undertaken a systematic study of the microscopic anatomy of these worms; our main aim is to understand the structure and development of the nervous system. This research plan has been aided by the use of molecular/developmental tools, the most important of which has been the sequencing of the complete genomes and transcriptomes of different members of the three clades. The data obtained has been used to analyse the evolutionary history of gene families and to study their expression patterns during development, in both space and time. A major focus of our research is the origin of 'cephalized' (centralized) nervous systems. How complex brains are assembled from simpler neuronal arrays has been a matter of intense debate for at least 100 years. We are now tackling this issue using Xenacoelomorpha models. These represent an ideal system for this work because the members of the three clades have nervous systems with different degrees of cephalization; from the relatively simple sub-epithelial net of Xenoturbella to the compact brain of acoels. How this process of 'progressive' cephalization is reflected in the genomes or transcriptomes of these three groups of animals is the subject of this paper

    RPGeNet v2 .0: expanding the universe of retinal disease gene interactions network

    Get PDF
    RPGeNet offers researchers a user-friendly queriable tool to visualize the interactome network of visual disorder genes, thus enabling the identification of new potential causative genes and the assignment of novel candidates to specific retinal or cellular pathways. This can be highly relevant for clinical applications as retinal dystrophies affect 1:3000 people worldwide, and the causative genes are still unknown for 30% of the patients. RPGeNet is a refined interaction network interface that limits its skeleton network to the shortest paths between each and every known causative gene of inherited syndromic and non-syndromic retinal dystrophies. RPGeNet integrates interaction information from STRING, BioGRID and PPaxe, along with retina-specific expression data and associated genetic variants, over a Cytoscape.js web interface. For the new version, RPGeNet v2.0, the database engine was migrated to Neo4j graph database manager, which speeds up the initial queries and can handle whole interactome data for new ways to query the network. Further, user facilities have been introduced as the capability of saving and restoring a researcher customized network layout or as novel features to facilitate navigation and data projection on the network explorer interface. Responsiveness has been further improved by transferring some functionality to the client side
    corecore